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Abstract. In 1952 Lucien Le Cam announced his celebrated result 
that, for regular univariate statistical models, sets of points of super- 
efficiency have Lebesgue measure zero. After reviewing the turbulent 
history of early studies of superefficiency, I suggest using the notion 
of computability as a tool for exploring the phenomenon of superef- 
ficiency. It turns out that only computable parameter points can be 
points of superefficiency for computable estimators. This algorithmic 
version of Le Cam's result implies, in particular, that sets of points 
of superefficiency not only have Lebesgue measure zero but are even 
countable. 

Key words and phrases: Asymptotic efficiency, computable estima- 
tors, superefficiency. 
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. . . if the true [parameter] value were known, 
a system of estimation could be devised 
which would give it with arbitrarily small 
variance; and such a system of estimation 
might happen to be adopted even if the 
true value were unknown. 



Harold Hotelling, from a letter to 
Ronald A. Fisher, 1930 

1. INTRODUCTION 

At the beginning of his recent paper [45] Stephen 
Stigler presents Hodges's famous example of a 
superefficient estimator ELS c\ nasty, ugly little fact 
that killed Fisher's beautiful theory of efficiency of 
maximum likelihood. Extending and permuting Wol- 
fowitz's [52] classification, we call the three main 
lines of defense against the little fact "exclusion of 
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the evil" (eliminating the superefficient estimators 
from competition), "deprecation of the evil" (show- 
ing that superefficiency can only happen on a small 
set of parameter points) and "collective responsi- 
bility" (refusing to accept a parameter point as a 
point of superefficiency, or even simply efficiency, 
unless its neighbors are points of efficiency). They 
will be reviewed in Section 2. Our review will be 
rather selective and will end around 1970 — by that 
time the theory of superefficiency for regular para- 
metric models had been essentially completed. 

The rest of this paper concentrates on the second 
line of defense, with a minimal, and very natural, 
admixture of the first line: we will restrict our at- 
tention to the computable estimators. On the other 
hand, we will never assume asymptotic normality 
of our estimators, although our definition of asymp- 
totic efficiency is motivated by comparison with the 
asymptotically normal case. The result that super- 
efficiency can occur only at computable parameter 
points is established in Section 3 as Theorem 1. Sur- 
prisingly, the regularity conditions required for this 
result are relatively simple and easy to check; this is 
discussed in Section 4. 

The notion of computability for real numbers and 
functions of real numbers will be defined and dis- 
cussed in Appendix A. The proof of Theorem 1 will 
be very brief in the part concerning computability, 
and the details will be provided in Appendix A. 
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Another appendix, Appendix B, contains a direct 
proof of the countability of sets of superefficiency 
not using the notion of computability. 

The absence of superefficiency for computable con- 
tinuous estimators at noncomputable points was first 
established in [49] in the framework of Kolmogorov's 
algorithmic theory of randomness (see, e.g., [24]). 
A serious disadvantage of the algorithmic theory of 
randomness is its unfamiliarity to most statisticians. 
Another disadvantage is that typical results proved 
in the framework of the algorithmic theory of ran- 
domness contain unspecified constants, which mask 
important details. This paper allows noncontinuous 
estimators and avoids using the algorithmic theory 
of randomness. 

There is more than one connection between su- 
perefficiency and computing. This paper applies the 
notion of computability to study superefficiency. In 
the opposite direction, Barron and Hengartner [4] 
use the notion of superefficiency to study the impor- 
tant computational problem of data compression. 

2. FISHER'S PROGRAM AND 
SUPEREFFICIENCY 

In papers [11] and [12] Fisher sketched his influ- 
ential program of establishing the asymptotic effi- 
ciency of maximum likelihood estimators. (See [1, 
44, 45] for background.) Let 9 n be the maximum 
likelihood estimate for a scalar parameter 9 found 
from a sample of size n. Fisher implicitly assumed 
regularity conditions that implied the existence of 
maximum likelihood estimates and much more. In 
the general discussion of this section we will not 
mention explicitly the required regularity conditions. 

Fisher's idea was to prove that: 

1. The scaled difference (9 n — 9)n 1 / 2 is asymptoti- 
cally normal with parameters (0, 1/1(9)), where 
1(9) is Fisher's information. [In this paper the 
normal distribution N(fi,a 2 ) is parameterized by 
its expectation // and its variance a 2 .] 

2. If another estimator T n is such that (T n — 9)n 1 / 2 
is asymptotically normal with parameters (0, v(9)), 
then v(9) > 1/1(9). 

Fisher proposed several informal arguments for these 
two statements. The first statement was established 
rigorously by Cramer [8]. Later Cramer's regularity 
conditions were relaxed, and analogous statements 
were established for methods of estimation different 
from maximum likelihood (such as Bayes estimators 



or Weiss and Wolfowitz's [51] maximum probability 
estimators). The second statement is wrong if under- 
stood literally, as shown by Hotelling in his letter to 
Fisher (see the epigraph; available on-line [17] and 
quoted by Stigler in [45]). 

The bluntest interpretation of Hotelling's objec- 
tion is that, for each parameter value 9, the estima- 
tor that is identically equal to 9, 

(1) T n :=9, 

is such that (T n — 9)n l l 2 is asymptotically normal 
with parameters (0, 0). Since < 1/1(9), the param- 
eter point 9 will be a "point of superefficiency." This 
notion of superefficiency was perhaps not particu- 
larly interesting to Fisher and Hotelling, since the 
estimator (1) is not consistent at parameter points 
different from 9. Hodges's implementation of 
Hotelling's idea (probably discovered completely in- 
dependently) is to set 

(0 , rp ._\e n , ti\9n-9\>n- l /\ 

{ ) n '~\9, if |<? n -0|<n-V4 

[Le Cam [26], Section 1, with a credit to Hodges 
(1951); Le Cam says that Hodges produced a series 
of examples and gives an example slightly different 
from (2)]. The advantage of Hodges's estimator is 
that it is consistent and, moreover, its asymptotic 
expected squared error is never worse than that of 
the maximum likelihood estimator (at least in the 
case of the Gaussian model with variance 1 consid- 
ered by Le Cam). Hodges's estimator may be said 
to be super efficient at 9 in the narrow sense: asymp- 
totically, it beats the maximum likelihood estimator 
at 9 and is not worse than the maximum likelihood 
estimator at the other parameter points. The esti- 
mator (1) is then super efficient in the wide sense. 

We will refer to the three approaches to dealing 
with superefficiency, the lines of defense mentioned 
in Section 1, as the first approach (exclude the evil 
by changing the qualifying rules), the second ap- 
proach (show that the evil, i.e., the set of points of 
superefficiency, is not great), and the third approach 
(declare a parameter point a point of inefficiency if 
some of its neighbors are points of inefficiency) . This 
appears to be the chronological order of their ap- 
pearance. Some work in broadly the same direction, 
such as that on the Bahadur [2] and Rao ( [38] , Def- 
initions 2.3-2.6) efficiency of estimators, is of a very 
different character and cannot be easily assigned to 
one of the three approaches. 
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2.1 Exclusion of the Evil 

It appears that the first approach was initiated by 
Fisher himself in 1930, who, in response to Hotelling's 
doubts, gave his "third proof" of the efficiency of the 
maximum likelihood estimator (in the terminology 
of Stigler [45], who points out that of Fisher's three 
proofs this is the only real proof). Fisher consid- 
ered only a finite observation space and restricted 
competition by considering only consistent estima- 
tors that are smooth functions (independent of the 
sample size n) of the observed relative frequencies 
Xj. A modification of the "third proof" was pub- 
lished in [13] (pages 45-46), which considered the 
consistent estimators defined by an equation of the 
form J2iki(@) x i = 0' the summation being over the 
observation space. 

A simple proof of Fisher's bound for finite ob- 
servation spaces and consistent estimators that are 
smooth functions of the observed relative frequen- 
cies was given by Rao in 1955 [37] (and reproduced 
in [40] ) . This proof was extended by Kallianpur and 
Rao [21] to the observation space R; they considered 
estimators that are Frechet differentiable functions 
of the empirical distribution function. Frechet dif- 
ferentiability was weakened to Gateaux differentia- 
bility by Kallianpur [20]. 

Another restricted class of estimators (although 
more general than Fisher's) was considered by Ney- 
man [33]. Neyman's overview of known properties of 
maximum likelihood estimators reflects beliefs pre- 
vailing at the time. Wolfowitz starts his review of 
[33] in Mathematical Reviews as follows: 

It is well known that maximum likelihood 
(ML) estimates have, under general condi- 
tions, the following properties: (a) consis- 
tency, (b) asymptotic normality, (c) min- 
imal variance of the limiting distribution. 

The corresponding statement in Neyman's paper is 
more hedged; Neyman refers to the earlier work by 
Hotelling [18] and Doob [9], neither of whom, how- 
ever, discussed (c). 

An important byproduct of the work on the first 
approach was the Cramer-Rao inequality for unbi- 
ased estimators ([14, 36], [8], Section 32.3). It ap- 
pears that this result per se is not directly connected 
with Fisher's program (as emphasized by Weiss and 
Wolfowitz, [51], pages 10-11). As a consequence, re- 
sults about superefficiency that are based on the 
Cramer-Rao inequality (such as Theorem 1 in [50]) 



impose regularity conditions on the allowed estima- 
tors that are difficult to interpret. 

The first approach has often been criticized. For 
example, Wolfowitz ([52], page 249) writes: 

... to argue that the maximum likelihood 
(m.l.) estimator is best by ruling out some 
of its competitors, is a dangerous if tempt- 
ing procedure. It can easily result in beg- 
ging the entire question. After all, to give 
an example from social life, anyone can 
become the chess champion of his town if 
the better players are arbitrarily declared 
ineligible to compete. Yet what we are 
seeking to establish is that the m.l. esti- 
mator is asymptotically the champion! 

In particular, he objects against the assumption of 
asymptotical normality of the estimators admitted 
to the competition. This is echoed by Weiss and 
Wolfowitz [51]: 

The problem is, however, to exclude only 
artificial competitors. If we exclude sen- 
sible and practical competitors then any 
claims about the optimality of the m.l. 
or any other estimator are hollow indeed, 
and the theorems proved do not describe 
the physical reality and are not of practi- 
cal value or aesthetic interest. 

In Wolfowitz's [52] terminology, any regularity con- 
ditions imposed on the estimators should be "statis- 
tically operational." He believed that the weak uni- 
form convergence of (T n — 6)n 1 / 2 to a random vari- 
able (not necessarily normal) depending on 9 is such 
a statistically operational condition. The require- 
ment of weak uniform convergence was also pro- 
posed by Rao [39] in 1963 (the same year that the re- 
sults of [52] were presented at the Seventh All-Soviet 
Union Conference on Probability and Mathemati- 
cal Statistics). Lehmann [29] suggests the alterna- 
tive condition that the variance v (9) of the limiting 
distribution of (T n — 9)n 1 / 2 should be a continuous 
function of 9. Lehmann notices that his condition 
is weaker than the condition of weak uniform con- 
vergence (under mild regularity conditions on the 
statistical model; cf. [52], Lemma 2) but also elimi- 
nates superefficiency: this follows immediately from 
Le Cam's result, since superefficiency at one point 
leads to superefficiency in a neighborhood of that 
point when v is continuous. 

Pfanzagl [34] develops further Wolfowitz's objec- 
tion against Fisher's program: 
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With the same justification with which 
Wolfowitz questioned the asymptotic nor- 
mality assumption for the sequence of es- 
timates one could question his assumption 
of weak uniform convergence: Why should 
a statistician confine himself to estimates 
for which the sequence of distributions of 
n 1 / 2 (T n — 9) converges at all? 

In his Theorem 1 Pfanzagl proves the absence of 
points of superefficiency for median unbiased esti- 
mators ([34], Theorem 1); this result is extended by 
Michel [32] to what he calls strongly asymptotically 
median unbiased estimators. 

2.2 Deprecation of the Evil 

The second approach was started by Le Cam's re- 
sult that sets of points of superefficiency have Lebesgue 
measure zero. The earliest version of this result was 
given without proof in the abstract [25] of his dis- 
sertation [26]. The dissertation itself [26] states and 
contains a proof of a stronger version of the result 
(cf. [25], Theorem 2, and [26], Theorem 9). However, 
the proof given in [26] is wrong, as noticed by Wol- 
fowitz [52]; it does not even prove the weaker version 
of [25]. Corrected proofs were given by Le Cam him- 
self [27] and Bahadur [3]. Paper [48] is devoted to 
the history and several proofs of Le Cam's result. 

The main difference between the versions of Le 
Cam's result given in [25] and [26] is that [26] does 
not assume that the estimator T n is asymptotically 
normal, whereas [25] makes this assumption. Le Cam 
[27] and Bahadur [3] revert to asymptotically nor- 
mal estimators. Pfanzagl ([34], Theorem 2) removes 
the condition of asymptotic normality proving a re- 
sult similar to the one claimed in Le Cam [26]. Both 
Le Cam [26] and Pfanzagl [34] assume that some 



function of T n (n(T n 



in [26] and (T n 



)n 



1/2 



in [34]) converges weakly to some probability mea- 
sure. Therefore, all these papers involve elements of 
the first approach. 

Whereas Le Cam [25, 26] considers superefficiency 
in the narrow sense, the results given in [3, 27, 34] 
concern superefficiency in an intermediate sense: the 
assumptions made about the estimator T n imply its 
consistency (and more), but it is not required that 
the asymptotic variance of (T n -9)n 1 / 2 should never 
exceed 1/1(9). 

In his paper [26] Le Cam claims that sets of points 
of superefficiency can be uncountable. There is no 
formal contradiction between Le Cam's claim and 



this paper's result: in his example (Example 4 in 
[26]) Le Cam uses a different, somewhat arbitrary, 
notion of superefficiency. This example will be fur- 
ther discussed in Section 3.3. 

The standard textbook [7], page 305, asserts that 
sets of points of superefficiency are countable. How- 
ever, this is simply a slip of the pen, since this state- 
ment is attributed to Le Cam [26] , who never makes 
it. 

2.3 Collective Responsibility 

In the third approach, when evaluating the perfor- 
mance of an estimator T n at a parameter point 9, 
one takes into account the performance of T n at pa- 
rameter points different from 9. As discussed at the 
beginning of this section, there is a whiff of this al- 
ready in the standard notion of superefficiency ( [26] , 
Definition 4), as used in the Berkeley group in the 
early 1950s: 9 does not qualify as a point of superef- 
ficiency of (1) because T n is so inefficient, not even 
consistent, at all other points. 

In the last section of his paper [26] Le Cam proves 
several results that belong to the third approach. 
His Theorem 14 says that the performance of a su- 
perefficient estimator in a shrinking neighborhood 
of a point of superefficiency is poor. His Theorem 
13 states this result in terms of a formal measure of 
performance of an estimator taking into account the 
performance at the neighboring points. 

Another early paper explicitly using the third ap- 
proach is Chernoff's [5]. Theorem 1 of that paper, 
in Chernoff's words, "states that for an arbitrary 
estimate the reciprocal of the information is 'essen- 
tially' asymptotically a lower bound for the asymp- 
totic variance." The word "essentially" refers to tak- 
ing the supremum of the asymptotic variances (suit- 
ably modified) over a shrinking neighborhood of the 
given parameter point. 

The culmination of this line of work was Hajek's 
[16] local asymptotic minimax theorem. (See Le Cam 
[28], pages 24-25, for a discussion of connections of 
this theorem with other results.) Hajek's result has 
been generalized in various directions, and at this 
time the third approach is perhaps the dominant 
one. 

2.4 Informal Comparison 

The difference between the first and third approaches 
is not always clear-cut. If an estimator performs well 
at a parameter point 9 but much worse at #'s neigh- 
bors, we can react to this in two ways: either elimi- 
nate the estimator from competition (first approach) 
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or punish the estimator by declaring its performance 
at 9 to be its worst performance at 8 and its neigh- 
bors (third approach). The former option is imple- 
mented as, for example, the requirement of weak 
uniform convergence in [39, 52] (discussed in Section 
2.1), the requirement of continuous convergence in 
[42] and requirement (3.7) in [51]. Apart from this 
borderline situation, the objections against the first 
approach quoted in Section 2.1 appear to be valid. 

The second and third approaches are convincing in 
different circumstances. The third approach is con- 
vincing when our a priori expectations for various 
values of 9 are diffuse. In the Bayesian case, where 
these expectations are expressed via a full-blown 
prior distribution, this distribution should not as- 
sign a positive weight to any specific value of 9. 
If the expectations are not diffuse (e.g., when the 
value 9 = corresponds to no difference between two 
treatments, the statistician might want to assign to 
it a positive probability), or too uncertain for us to 
judge how diffuse they are, the third approach be- 
comes less convincing. 

2.5 Contribution of this Paper 

Our main result, Theorem 1 in the next section, 
answers a natural question: can a given parameter 
point 9 be a point of superefficiency? Hotelling's and 
Hodges's examples, (1) and (2), work for any 9, but 
if 9 is noncomputable, the resulting estimators are 
also noncomputable, and their existence is of no use. 
Our theorem says that no computable estimator can 
be super efficient at a noncomputable point. In this 
way the notion of computability further limits the 
damage inflicted by Hotelling's objection: yes, su- 
perefficiency (in its most extreme form, T n = 9 for 
all n) is possible at computable points 9, but there 
can be no superefficiency at the other 9. 

3. COMPUTABILITY OF POINTS OF 
SUPEREFFICIENCY FOR 
COMPUTABLE ESTIMATORS 

Let fii, fi2i • ■ • be a sequence of measurable spaces, 
and for each re £ {1, 2, . . .}, let {P n ,e I 6 £ ©} be a 
statistical model on £l n . We will assume that is 
an open interval of the real line (O = M is allowed). 
Each P n g is a probability measure on f2 n , and # G 
is the parameter to be estimated. Little will be lost 
if the reader assumes that f2 n = f2 n and P U) q = {Pe) n 
for all re, which corresponds to independent observa- 
tions chosen from an observation space O according 



to Pq. An estimator T = {T n }^ =1 for {P n ,e} is a 
sequence of measurable functions T n : J7 n — > 0. 

We will need a condition of regularity, which will 
be stated in terms of a natural measure of closeness 
between probability measures. Formally, the affinity 
between probability measures P and Q on the same 
measurable space £1 is defined by 

(3) n(P,Q) :=mimax(P(E),Q(n\E)), 
E 

E ranging over the measurable sets in Q. Notice 
that: 

• it is always true that ir(P, Q) € [0, 1] and ir(P, P) 6 

[1/2,1]; 

• probability measures P and Q are mutually sin- 
gular if and only if tt(P,Q) = 0; 

• sequences of probability measures P n and Q n on 
measurable spaces £ix6 asymptotically entirely 
separated if and only if liminf n ^oo 

Another ingredient of our regularity condition will 
be a continuous function 7:0— > (0, oo) (typically, 
Fisher's information). 

Assumption 1. For any e > and any 9 G 0, 
there exist a positive integer N and a neighborhood 
O C of 9 such that, for all n > N and 8 U 2 € O, 



(4) 7r(P n , ei ,P n , fl2 )>$ 



where is the -/V(0, 1) distribution function. 

Assumption 1 is a weak form of the uniform con- 
dition of local asymptotic normality. It will be dis- 
cussed in the next section. In Section 4.2 we will 
see that it is satisfied for statistical models satisfy- 
ing standard regularity conditions (cf. the reference 
to [19] there). As a simple sanity test, in Sections 
4.1 and 4.3 we check it for the Gaussian model with 
known variance. 

Theorem 1. Suppose {T n } is a computable es- 
timator for {P n ,e} satisfying Assumption 1. For any 
c> and any noncomputable 9 £ Q, 



(5) limsu P P n , e (|T„ -9\> cn~ 1 / 2 ) > <5>{-cJ I{9)). 

n— >oo 

As we said earlier, computability is discussed in 
Appendix A. The reader who is only interested in 
the countability of points of superefficiency (Corol- 
lary 2 below) can ignore all statements about com- 
putability; the proof of Theorem 1 will still show 
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that there are only countably many points of super- 
efficiency under Assumption 1. A streamlined inde- 
pendent proof of the countability of points of super- 
efficiency is given in Appendix B. 

If n - 9)n x l 2 is asymptotically N(0, 1/1(0)) un- 
der P n g for some estimator 9 n , such as the maxi- 
mum likelihood estimator, we will have an "almost 
opposite" inequality to (5), 



]imswp P n g(\9 n — 9\ > cn 



-1/2N 



(6) 



< 2$( 



1(9)). 



The use of probabilities P n fi(\T n — 9\ > cn" 1 / 2 ) for 
measuring the concentration of estimators is very 
standard in the literature discussed in Section 2: cf., 
for example, Le Cam's discussion of concentration 
in [26] (starting from page 288), Wolfowitz's Theo- 
rem [52] (pages 258-259), Schmetterer's [42] Theo- 
rem 2.2, Pfanzagl's [34] Theorems 1 and 2. 

There is a gap between the right-hand sides of (5) 
and (6). To eliminate it, we can consider only large 
values of c. Let us define the asymptotic efficiency of 
an estimator T = {T n } at a parameter point 9 £ 
as 



(7) 



aee>(T) 



lim inf lim inf ■ 

c— >oo n— >oo 



InR 



n,b 



> cn 



-1/2) 



c 2 I(9)/2 

(with convention — InO := oo). Since — ln$(— x) ~ 
x 2 /2 as x — > oo (see, e.g., [10], Lemma VII. 2), (6) 
implies that &eg(9) > 1. In this sense the maximum 
likelihood estimators are efficient under the usual 
regularity conditions. We can say that T is super- 
efficient at 9 if &eg(T) > 1. Inequality (5) implies 
ae (T) < 1. 

In the classical case of (T n — 9)n 1 / 2 asymptot- 
ically normal with parameters (0,v(9)), &eg(T) = 
l/(I(9)v(9)), and so, under the usual regularity con- 
ditions, ae#(T) is the ratio of the asymptotic vari- 
ance of the rescaled maximum likelihood estimator 
to the asymptotic variance of rescaled T n . Therefore, 
in this case ae#(T) is the classical asymptotic effi- 
ciency of T at 9, as defined by Fisher [11] (page 316) 
and Cramer [8] (Section 32.5). 

Before proving Theorem 1, we state three simple 
corollaries of it, all assuming Assumption 1. 

Corollary 1. If the parameter point 9 is non- 
computable and a computable estimator {T n } is such 
that (T n — 9)n 1 / 2 weakly converges to N(0,v(9)) , 
then v(9) > 1/1(9). 



PROOF. It suffices to notice that l/(I(9)v(9)) = 
ae (T) < 1. □ 

Corollary 2. Ifc > and {T n } is an estimator 
for {P n .e], the inequality 



(8) limsupP„ >e (|T n - 9\ > cn~ 1 / 2 ) < $( 



1(0)) 



holds for at most countably many 9 £ 0. In par- 
ticular, aeg(T) > 1 for at most countably many 9. 
In particular, if (T n — 9)n 1 / 2 weakly converges to 
N(0,v(9)) for all 9 £ 0, then v(9) < 1/1(9) holds 
for at most countably many 9. 

The last part of Corollary 2 was also proved (un- 
der different regularity conditions) in [49], the Ap- 
pendix. 

PROOF of Corollary 2. Every estimator is 
computable with respect to some oracle (see Sec- 
tion A. 4 of Appendix A for information about ora- 
cles). Fix such an oracle for {T n }. Theorem 1 will 
continue to hold if computability is replaced by com- 
putability with respect to this oracle. Finally, there 
are only countably many parameter points that are 
computable with respect to this oracle. □ 

As mentioned earlier, a proof of Corollary 2 not 
using the notions of computability and oracle can 
be extracted from the proof of Theorem 1. See Ap- 
pendix B for details. 

We can define the asymptotic estimability ae# of 
a parameter point 9 as 

&eg := supaee(T), 

T 

with T ranging over the computable estimators. The 
following corollary is a formalization of a new all- 
or-nothing phenomenon arising in our algorithmic 
framework. 

Corollary 3. Suppose there is a computable 
estimator 9 n satisfying (6). Then, for each 9 £ 0, 
either ae# = 1 or ae# = oo. 

Proof. By (6) we have &e@ > 1. In combination 
with (5), this gives ae# = 1 for noncomputable 9. If 
9 is computable, setting (1) gives aeg = oo. □ 

The formalization given by Corollary 3 is imper- 
fect, because we have much more than ae# = oo for 
computable 9: there is a computable estimator that 
estimates 9 with zero error, T„ = 9. 
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3.1 The Role of Computability 

In Theorem 1 and its corollaries the notion of 
computability is applied to two kinds of objects: es- 
timators and parameter points. There is, however, 
an important philosophical difference between com- 
putable estimators and computable parameter points. 

The purpose of estimators is to be used for com- 
puting estimates, and so their computability is es- 
sential. Accordingly, in our discussion we restrict 
ourselves to computable estimators. 

A parameter point is not meant to be computed 
by anybody. Depending on which school of statistics 
we listen to, it is either a constant chosen by Na- 
ture or a mathematical fiction. In any case, there is 
no reason for us to require or expect that it should 
be computable. In fact, noncomputable parameter 
points are often more important than computable 
ones: for example, if the parameter point is chosen 
from a smooth prior on the real line, it will be non- 
computable with probability one. 

Theorem 1 implies that, for the standard regu- 
lar statistical models, the maximum likelihood es- 
timator is efficient (cannot be beaten by any other 
computable estimator) at noncomputable parame- 
ter points. This statement remains important de- 
spite the complementary statement that the max- 
imum likelihood estimator is greatly outperformed 
by Hotelling's and Hodges's estimators at computable 
parameter points. 

3.2 Proof of Theorem 1 

The proof will use the following implication of As- 
sumption 1. 

Lemma 1. Let c > 0, a G (0, 1), 9 G 6, and I > 
1(9). There exist e > 0, positive integer N, and a 
neighborhood 0C6 of 9 such that, for any n> N , 
01, 02 G O, and Ai,A 2 <^ £l n satisfying 

(9) max(P^(Ai),P„ |fla (A 2 )) < a$(-<V7) 
and 

(10) \9 2 -9 1 \<2(l+e) 3 cn- 1 / 2 , 
it is true that A\ U A2 7^ fi n . 

Proof. Let e > be so small that 

$(-(1 + efcyfl) - e > a$(-cv / 7). 

Take any N and O satisfying the condition in As- 
sumption 1. Using 1(9) < I and (10), we now obtain, 



for n> N, 

7T(Pn,0i.Pn,0a) > $ 

>$(-(l+e) 3 cV7)-£ 
> a^(-cVl). 

Were it true that A\ U A2 = £l n , (9) would imply the 
opposite inequality 

n( p n,9i > p n,e 2 ) < max(P nA (At) , P„,e 2 (fi \ A{)) 

<ma,x(P nt g 1 (A l ),P nt e 2 (A2)) 

<a$(-cVl). □ 



Proof of Theorem 1 . Fix some c > and 
9 G G such that (5) fails. We will exhibit an algo- 
rithm for computing 9, which will prove the theo- 
rem. There exist a G (0, 1), I > 1(9), and ./V such 
that, for all n> N, 

(11) P n ,e(\T n -e\> cn- 1 ' 2 ) < a^(-cVl). 

Let e > 0, N, and 3 9 satisfy the condition in 
Lemma 1 [so that iV is assumed to be large enough 
both for (11) to hold and for the condition in Lemma 
1 to be satisfied]. 

Choose an open interval (L, fi)CO with rational 
end-points such that 9 G (L, R) and I(q) < I for all 
q G (L, R). In what follows we will also impose some 
other conditions on the interval (L,R) (it has to be 
sufficiently short). 

Let (9i,92) C (L,R) be an open interval contain- 
ing 9. We will construct, in a computable manner, 
an open interval (9'i,9' 2 ) Q (L,R) whose length is at 
most (1 + e)~ 1 \9 2 — 6\\ and which still contains 9. 
Repeating this operation, we can compute 9 to any 
accuracy starting from (L,R): the length of the in- 
terval known to contain 9 will tend to zero exponen- 
tially fast. 

First notice that we can compute a positive integer 
n such that 



(12) 



2(1 



e) 2 cn 



- l ' 2 <\9 2 -9 l \ 

< 2(1 + efcn~ l/2 , 



and that we can assume that the resulting n sat- 
isfies n> N. Indeed, the double inequality (12) is 
equivalent to 

(13) 5 1 + £ . )V <n< 4(1 + £)6c2 



O1) 2 



O1) 2 
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so making (L,R) D (0i,02) sufficiently short will en- 
sure the existence of n satisfying (12) and the in- 
equality n> N for all such n. 
Let us say that a point q G is suitable if 

(14) P n> g(\Tn -Q\> cn~ 1/2 ) < a$(-cv / T). 

Let 6 = ©n,0i,02 ^ e the se ^ °f an suitable points in 
(0i,02)- We 'know that 0G6; see (11). 

Let us show that \q 2 — qx\ < (1 + e)~ 2 |02 — 0i| for 
all gi,92 £ S. This follows from Lemma 1: setting 
Ai := {w | \T n {oj) — qi\ > cn" 1 ^ 2 }, i = 1,2, and using 
(12) and (14), we can see that there exists uj G £l n \ 
(Ai UA 2 ). Since 



\T n (u>) - qi\< cn 



-1/2 



and 



\T n {u)-q 2 \<cn~ 1 / 2 , 

the triangle inequality and (12) imply 

|?2-9i| <2cn- 1 / 2 < (l + e)- 2 |0 2 -0i|. 

The estimator {T n } is computable, and so we can 
compute an open interval (0'i,0' 2 ) 5 6 of length \9' 2 — 
0il < (l + e)~ 1 |0 2 -0i|. (See Section A.2 of Appendix 
A for details.) This completes the proof of the the- 
orem. □ 



At the end of his construction of {T n }, Le Cam 
says that a different, "necessarily more complicated," 
construction will give another estimator and another 
uncountable set that enjoy similar properties to {T n } 
and S without the restriction to n = 7 2j . This would 
eliminate the second reason, but an independent ver- 
ification of Le Cam's claim is desirable: there is no 
proof in [26], and even the proofs in that paper are 
"quelquefois incorrectes" ([27], page 17). 

4. REGULARITY CONDITIONS 

The present section and Appendix A are devoted 
to the regularity conditions imposed on the sequence 
of statistical models {P n ,e} and the estimator {T n }, 
respectively. The status of these two sets of regular- 
ity conditions is very different: whereas the condi- 
tions imposed on the estimator should be minimal 
(cf. Section 2.1), we can be much more flexible in 
choosing the conditions imposed on the sequence of 
statistical models: even if these conditions are rela- 
tively strong, they are still likely to be satisfied by 
many important models (cf. the discussion in [52], 
Section 3). 

In this section we will see that Assumption 1, es- 
sentially saying that 



3.3 Le Cam's Example 

In one of the examples in [26] (Example 4), Le 
Cam constructs an uncountable set S and an estima- 
tor {T n } for the Gaussian model {P nfi } = {(N(9, l)) n } 
with unknown mean and known variance 1 such 
that: (a) for all G S and for all n of the form 
n = 7^',i = 2,3,..., 

i 3 n,e(|T n ,-0|>n- 1 / 2 ) 

<^(|0n-0|>n- 1 / 2 )-O.18, 

and (b) for all ^ S and for all sufficiently large n 
of the form n = 7 2j , 



P n ,e{\T n -0\>n 



-l/2> 



PnA\®n-e\>n- 1 / 2 ) 



9 n being the maximum likelihood estimator. This 
does not contradict our Corollary 2 for two reasons: 
first, our definition of ae#(T) involves the probabil- 
ities P n ,e(\T n — 0| > cn -1 / 2 ) for large c, whereas Le 
Cam arbitrarily fixes c := 1, and second, the restric- 
tion to n = 7 2j invalidates our proof, which crucially 
depends on the set of allowed n being sufficiently 
dense [see (13)]. 



liminf [Tr(P ntei ,P n> e 2 ] 

"l ,&2 — ^ \ 



> 



for all G 0, follows from easier to check or more 
standard assumptions. 

4.1 In Terms of the Likelihood Ratio 

We make the standard assumption that for each 
n all P n q are absolutely continuous with respect to 
a cr-finite measure fi n . Let f n Q be a density of P n ^ 
with respect to [i n . 

Assumption 2. For any e > and any G 0, 
there exist a positive integer N and a neighborhood 
O C of such that, for all n > N and for all dis- 
tinct 01, 02 G O, 



I fn£ 2 ^ 
"n,9i I > 1 



(15) 



/n,6»i 



> $ 



e. 



Lemma 2. Assumption 2 implies Assumption 1. 
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Proof. By symmetry, we can complement (15) for all i, 



by 



p„ A (^ >1 )>,(_^«)_ £ . 

Therefore, it suffices to prove 

n(P ni6l ,P n>e2 ) > mm(p n!6l (j^- > lj , 



(16) 



n,02 



fnfii ^ ^ 



nfi 2 



Suppose this inequality is false, and so we can find 
t such that 

^(Pnfi 1 ,P n fi 2 ) 
<t 

, ■ ( d ( fnfii . -, \ t-, ( fn,9i -. 

< mm P nfil — ^— > 1 , P nfi2 — ^— > 1 

V \Jnfij / \Jnfi 2 

In this case, there exists an event E such that 



Pn,0i (E) < t, Pnfix 



fn, 



>1 >*, 



P ni e 2 (E)>l-t, P n: 



fnfi 2 
fnfix 



> 1 < 1 - t. 



This contradicts the Neyman-Pearson lemma. □ 

An easy calculation shows that Assumption 2 is 
satisfied for sampling from the Gaussian family 
N(9, a 2 ) with known variance a 2 > and with 1(9) := 
a~ 2 [in other words, with 1(6) Fisher's information] . 
In fact, for this statistical model we have 



I fnfii -, 

Pnfi! \ 7 > 1 

Jnfii 



<I> 



2a 



with an equality and without the need to subtract 
e. Therefore, Assumption 2 and, a fortiori, Assump- 
tion 1 are not vacuous. 

4.2 Local Asymptotic Normality 

Another assumption that implies Assumption 1 
is the following uniform version of the condition of 
local asymptotic normality. 

Assumption 3. For any 6 £ Q, any A > 0, any 
sequence 0, — > 6 of elements of ©, any sequence 
Tij — > oo of positive integers, and any sequence 
Aj — > A of positive real numbers such that 6i + 
Aj/ ' \JriiI(6i) 66 for all i = 1, 2, . . . , there exist se- 
quences A, and ipi of random variables such that, 



(17) hi 
and: 



nifii+Xi/^/njJOi) 

fnifii 



AAj - A 2 /2 



• the distribution of Aj with respect to P ni t g i weakly 
converges to A(0, 1); 

• ipi converges to in P n . g. -probability. 

The derivation of a slightly stronger version of As- 
sumption 3 under standard regularity conditions can 
be found in [19] (Definition II. 2. 2, Theorem II. 1.2 
and Remark II. 1.4). 

Lemma 3. Assumption 3 implies Assumption 1. 

Proof. Suppose that Assumption 3 holds whereas 
Assumption 1 does not hold. The latter implies that 
there exist e > and 6 € O such that for each posi- 
tive integer N and each neighborhood O of 6 there 
exist n> N and 61,62 £ O for which (4) is violated. 
Fix such e and 6. There exist sequences rii — > 00 of 
positive integers and Q% — > 6, 6% — ► 6 of elements of G 
such that, for all i, 6i<6{ and 



(18) 7T(P niA ,P, ' 



(6i-0i)y/riTWj 



e. 



It is clear that 6 in (18) can be replaced by 6{ (slightly 
decrease e and disregard the initial i's if necessary). 
Setting Aj := (6i — 9i) y/nl (9i) , we can rewrite (18) 
as 



(19) 



A, 



The last inequality shows that the sequence Aj is 
bounded. Therefore, we can assume, without loss 
of generality, that A, — ► A for some A > (consider 
a subsequence of i if necessary). Fix sequences Aj 
and ipi satisfying the conditions in Assumption 3. 

Notice that A > 0: indeed, if A were zero, (17) 
would converge to zero in probability, which would 
contradict (19). Therefore, 
- f - 

■> nifii 



P, 



11,. 



(20) 



V fnifii 



> 1 



P, 



Ai> 



2 A 



In a similar way we can obtain 

fnifii 



(21) 



ni, 



> 1 



nifii 



<I> 



Inequalities (20) and (21) contradict (19) and (16). 
□ 
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4.3 In Terms of the Variation Distance 

The variation distance \\P — Q\\ between two prob- 
ability measures on the same measurable space f2 is 
defined to be 

||P-Q||:=sup|P(P)-Q(P)|, 

E 

E ranging over the measurable sets in Vl. A slightly 
stronger form of Assumption 1 can be stated in 
terms of variation distance rather than affinity. 

Assumption 4. For any e > and any 9 G 6, 
there exist a positive integer N and a neighborhood 
O C 6 of 9 such that, for all n > N and U 9 2 G O, 



n,t>i 



P„,9 2 ||<l-2$ 



|0 2 -0ilvWJ 



Theorem 1 remains true if Assumption 1 is re- 
placed by Assumption 4. This follows from the fol- 
lowing lemma. 

Lemma 4. It is always true that 

Proof. The required inequality 
infmax(P(P),Q(n\P)) 

E 



> 



1 -sup E \P(E)-Q(E)\ 



follows from 



max(P(E),Q(n\E)) > 



1-\P(E)-Q(E)\ 



vp, 



and the last inequality is true even when the max is 
replaced by the arithmetic mean. □ 

It is easy to check that Assumption 4 is satisfied 
for sampling from the Gaussian family N(8, a 2 ) with 
known variance a 2 and with 1(6) := a~ 2 . For this 
model, 



IP, 



n,Vi 



P 



71,02 I 



1-2$ 



h\V n 



2(7 



again with an equality and without the need to sub- 
tract e. 

5. CONCLUSION 

It is widely accepted that advances in computing 
have brought about deep changes in the theory and 
practice of statistics. However, the use of the theory 
of computing, and, in particular, of its core notion 



of computability, has been very limited in the classi- 
cal areas of statistics, such as parameter estimation 
and hypothesis testing. The notion of computabil- 
ity appears to be especially useful in questions of 
efficiency and super efficiency, where it allows us to 
delineate the class of statistical procedures that we 
would like to compete with. In particular, restrict- 
ing ourselves to computable estimators, we can ask 
whether a given parameter point 9 can be a point 
of superefficiency. Hotelling's and Hodges's exam- 
ples show that, without this restriction, the answer 
is vacuous: any 8 can be a point of superefficiency. 
With the restriction, the answer we gave in Section 
3 is that superefficiency is impossible at noncom- 
putable 6, whereas "hyperefficiency" (T n = 9 for all 
n) is possible at computable 9. 

This paper only deals with the most classical as- 
pects of superefficiency. It does not even touch mul- 
tivariate regular statistical models, let alone models 
in which rates of convergence of the maximum like- 
lihood estimates are different from n" 1 / 2 and non- 
parametric models. Can the notion of computability 
be usefully applied in these and other more complex 
cases? I hope the answer is positive. 

APPENDIX A: COMPUTABILITY 

The first paper to propose a general notion of 
computability, and to claim that its notion of com- 
putability is general, was Church's [6] (1936). Church 
considered functions F : N m — > N, where N is the set 
of all positive integer numbers; for now, we will be 
only interested in the case m = 1. On the one hand, 
he formally defined his class of computable ("effec- 
tively calculable," as he said) functions, and on the 
other hand, he put forward the informal thesis (of- 
ten referred to as the Church thesis) that his for- 
mal notion is the formalization of our intuitive no- 
tion of computability. One of Church's arguments 
in favor of the Church thesis was that two natural 
but very different definitions of effectively calculable 
functions, Church and Kleene's A-definability and 
Herbrand and Godel's recursiveness, are equivalent. 

The Church thesis was further boosted by Alan 
Turing's observation [46] that Church's effective cal- 
culability is equivalent to computability using a for- 
mal model of a computing device, nowadays known 
as the Turing machine. A similar computing device 
was introduced at the same time by Emil Post [35], 
and another, rather different one, was introduced 
later by Andrei Markov, Jr. [30]; both devices led to 
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the same class of computable functions as the Turing 
machine. 

In 1953 Andrei Kolmogorov [22], later joined by 
his student Vladimir Uspensky [23], carefully an- 
alyzed the notion of an algorithm and introduced 
its very general formalization. Kolmogorov and Us- 
pensky's goal was to show that "the most general, 
for the current state of science, notion of an algo- 
rithm" (my translation) leads to the same class of 
computable functions. As they had expected, their 
formalization (along with several other definitions 
they considered but did not include in the paper) 
indeed turned out to be equivalent to the previous 
ones. 

At this time, there is a consensus that the intu- 
itive notion of computability for functions F : N — > N 
is indeed captured by the numerous available equiv- 
alent definitions. This notion will be assumed to be 
known in the rest of this paper; precise definitions 
can be found in, for example, Rogers's classical book 
[41]. 

A set A C N is called decidable if the function 

v [2, otherwise, 

is computable. A function F : A — > B, where A and 
B are decidable subsets of N, is said to be com- 
putable if its extension F' : N — > N defined as 

F 't x) . = {F{x), ifxeA, 
\ 1, otherwise, 

is computable. 

Many familiar countable sets X, such as N 2 , the 
set Q of all rational numbers, the set of all open in- 
tervals (a, b) C R with rational end-points, etc., can 
be represented as "spaces of finite objects" (in the 
terminology of Shoenfield [43]) by fixing a canoni- 
cal injection <px : A — > N mapping X onto a decid- 
able subset of N. For example, a popular bijection 
<fy$2 : N 2 — > N is the Cantor pairing function; it turns 
N 2 into a space of finite objects. The reader will be 
assumed to be familiar with such canonical injec- 
tions 4>x for the standard spaces of finite objects X. 
Intuitively, <j>x{ x ) encodes x 6 X as a positive inte- 
ger, and instead of working with finite objects x E X 
directly, we can work with their codes. 

The computability of F : X — > Y, where X and Y 
are spaces of finite objects, is defined as the com- 
putability of cj>Y o F o (j) x : (f)x(X) — > <Py(Y). A set 
A C X , where A is a space of finite objects, is said 
to be recursively enumerable if A = F(N) for some 
computable function F : N — ► X. 



A.l Computable Real Numbers 

The main goal of Turing's paper [46] was, in fact, 
not the definition of computable functions but the 
definition of computable real numbers. Turing's def- 
inition was that a real number is computable if its 
decimal expansion is computable. There are many 
equivalent definitions. For example, a real number 
t is computable if and only if there exists a com- 
putable function F : N — > N such that —F(n)/n\ < 
1/n for all n € N. This notion of a computable real 
number is as uncontroversial as the notion of a com- 
putable function F : N -> N. 

Theorem 1 talks about computability of two ob- 
jects: the estimator {T n } and the parameter point 
6. We have just defined what the computability of 
6 means. The situation with {T n } is more compli- 
cated. Typically, T n : R n — ► R, and the notion of com- 
putability of real- valued functions of real numbers is 
notoriously ill-defined. There is the "core" notion of 
a computably continuous function, to be discussed 
in Section A. 3, but there is no consensus about the 
"right" definition for more general classes of func- 
tions. In the next subsection we define computable 
estimators in an ad hoc manner, in order to obtain 
a strong statement of Theorem 1. 

A. 2 Computable Estimators 

The theory of computability over the real num- 
bers often uses "effective" (i.e., computable in some 
sense) versions of various topological notions, such 
as openness, closeness, continuity, etc. A set A C R is 
said to be effectively open if it is the union of a recur- 
sively enumerable set of open intervals with rational 
end-points. In other words, A is effectively open if it 
can be represented in the form A = Ui( a i> &i)> where 
(ai,bi), i E N, is a computable sequence of open in- 
tervals with rational end-points. Complements R\ A 
of effectively open sets A C R are called effectively 
closed. More generally, sets A x C R indexed by x € 
X , where A is a space of finite objects, are said 
to be effectively open uniformly in x if they can be 
represented in the form A x = \J i (ai(x),bi(x)), where 
(i,x) i — > (aj(x), bi{x)) is a computable function map- 
ping N x A to the set of open intervals with ratio- 
nal end-points. In this case the complementary sets 
R \ A x are said to be effectively closed uniformly in 
x. 

Now we can define our notion of a computable 
estimator. As in Sections 3 and 4, we consider a 
sequence of statistical models {P n ,e}, where 6 ranges 
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over an open interval 6C1; the end-points of O are 
assumed computable (by definition, — oo and oo are 
computable). Let {T n } be an estimator for {P n ,e}- 
We say that {T n } is computable if, for each 5 G Q n 
[0, 1), the closures 

(22) {qeQ\P n , q (\T n -q\>u)<5} 

are effectively closed uniformly in n G N and w G Q H 
[0, oo). Intuitively, the inequality "<" in (22) means 
that T n is a good estimator when the true parameter 
point is q, with 5 and u determining how demanding 
our notion of "good" is. The closure of the set of such 
q is required to be uniformly effectively closed. It 
seems obvious that this condition will be satisfied for 
estimators {T n } specified by an explicit procedure. 

Notice that our definition of computability of an 
estimator {T n } is in fact a joint requirement on the 
estimator and {P n ,e}- Interestingly, it does not im- 
pose any computability restrictions on the sample 
spaces Q n , which do not enter the definition explic- 
itly. 

It is easy to check that our definition of com- 
putability of {T n } agrees with the definition of com- 
putability of a parameter point 9 G O, in the sense 
that the two notions coincide when T n := 9 is a con- 
stant estimator. Indeed, in this case, 

{q | P n , q (\T n -q\> u) <5} = [9-u,9 + u] n 0, 

and the last family of closed sets are effectively closed 
uniformly in u G Q D [0, oo) if and only if 9 is com- 
putable. 

Let us now check that the proof of Theorem 1 
goes through for our definition of computability of 
{T n }. Without loss of generality, we assume that c 
and e in the proof of Theorem 1 are rational num- 
bers and that a and I are chosen in such a way 
that S := a$(-cv7) [cf. (11)] is a rational number; 
we have already said that (L, R) is an interval with 
rational end-points. The requirement (13) leaves us 
enough freedom to make n a square (i.e., to make 
n 1 / 2 integer), assuming that the interval (L,R) is 
sufficiently short. Therefore, the closure of the set of 
q G satisfying (14) is effectively closed uniformly 
in the squares n> N. Assuming that (9\, 62) is an in- 
terval with rational end-points [this is true initially, 
for (61,62) = (L,R)], we can compute a new inter- 
val (9'i,9 2 ) 5 S with rational end-points of length 
1 02 - 0i I < (! + e) -1 |02 - 0i\- (Just use the defini- 
tion of effective closeness and the compactness of 
bounded closed intervals in R.) 



A. 3 Computable Continuity 

This subsection discusses the traditional notion 
of computability over the reals going back to the 
work of Brouwer on the intuitionistic foundations of 
mathematics; see [31] for an excellent description. 
Grzegorczyk [15] showed that this traditional notion 
of computability is equivalent to several other defi- 
nitions considered in literature. An advantage of his 
exposition is that it is firmly based on the standard 
foundations of mathematics. The term "computable 
continuity" (in the form "computable continuous") 
is Grzegorczyk's ([15], footnote on page 71), who be- 
lieved that it is possible to introduce some kinds of 
computable real functions which are not continuous. 

Intuitively, a function F defined over the reals is 
computably continuous if we can compute F(x) to 
an arbitrary accuracy when given x to an arbitrary 
accuracy. This condition indeed implies the conti- 
nuity of F: for example, the simplest discontinuous 
function 

F(x)-=( 1 ' ifX "°' 

|_ 0, otherwise, 

can never be computed to accuracy 1/3 at the point 
x = 0, no matter how accurately we know x. On 
the other hand, any explicitly given continuous func- 
tion the reader is likely to come across will be com- 
putable. 

We start from defining what it means for a se- 
quence of statistical models {P n ,e} to be computably 
continuous (in the topology of weak convergence). 
As before, we assume that 9 ranges over an open 
interval of the real line R with computable end- 
points, and, for concreteness, we also assume that 
P n Q is a probability measure on O n = R n . 

A basic set in R m is the product YliLi(cn,bi) °f 
bounded open intervals with rational end-points. An 
elementary set in R m is a finite union of basic sets. 
The family of basic sets and the family of elemen- 
tary sets can be regarded as spaces of finite objects. 
A subset of R m is effectively open if it is the union 
of a recursively enumerable family of basic sets. A 
function F : — ► R is computably lower semicontin- 
uous if the set {(6,t) \ F(6) > t} is effectively open. 
The uniform versions of effective openness and com- 
putable lower semicontinuity are defined as before. 
A sequence of statistical models {P n ,e} is said to 
be computably continuous if the function P n g(E) is 
computably lower semicontinuous in 9 uniformly in 
n G N and elementary sets E C Q n . This is a weak 
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condition; the statistical models usually found in 
statistics textbooks are computably continuous. 

Fix a computably continuous sequence of statis- 
tical models {P n .e}- Let {T n } be an estimator for 
{Pn,e}- It is computably continuous if both sets {(x,t) 
Q n xl | T n (x) > t} and {(x,t) G O n x R | T n {x) < t} 
are effectively open uniformly in n G N. 

It is not difficult to check that all computably con- 
tinuous estimators are computable in our sense [see 
(22)]. In fact, for computably continuous estimators 
the operation of closure in (22) is superfluous: al- 
ready the sets {q \ P n ,q(\T n — q\ > u) < 5} are effec- 
tively closed uniformly in n and u. 

A. 4 Computability with an Oracle 

The important idea of computability with an or- 
acle was introduced by Turing [47]. An oracle Tur- 
ing machine is allowed to read a tape containing an 
infinite sequence S of symbols, not necessarily com- 
putable. Replacing in all our definitions computable 
functions F : N — > N with ,5- computable functions 
F : N — > N (i.e., functions computable by oracle Tur- 
ing machines allowed to read S) leads to the notions 
of S'-computable real numbers, ^-computable esti- 
mators, S-computably continuous estimators, etc. 
Theorem 1 remains true if the two entries of "com- 
putable" are replaced by "iS-computable." Since ev- 
ery estimator is S'-computable for some S (see 
Lemma 5 below), this "relativized" version of The- 
orem 1 contains Corollary 2 as a special case. 



Lemma 5. 
some S. 



Suppose 9 satisfies (11) for all n > N. It suffices to 
prove that there exists an open interval (L, R) 3 9 
such that 9 is the only point in (L, R) satisfying (11) 
for all n > N. Let e > 0, N and O B 9 satisfy the con- 
Gdition in Lemma 1. Take any (L,R) C O satisfying 



4„2 



(23) 



4(l+e) 4 c 

[R-LY 

4(l+e)V 
(R - L) 2 (R - L) 2 



4(1 + e 



, 6 C 2 



> -/V, 



> 1, 



and I(q) < I for all q G (L, R). 

Suppose (L, R) contains two distinct points 9\ and 
92 satisfying (11) for all n> N. Choose positive in- 
teger n satisfying (12) and n> N. Such an n exists 
since (12) is equivalent to (13) and we have assumed 
(23). By Lemma 1, (12) and (14) (applied to q = 6 1 
and q = 6*2), there exists u G fl n such that 



\T n (u) 



Every estimator is S-computable for 



PROOF. Let (oj, bi), i = 1, 2, . . . , be a computable 
enumeration of all open intervals with rational end- 
points. It suffices to take as S an infinite binary 
sequence encoding the function F : N x N x (Qn 
[0, 00)) x (Qn [0, 1)) -> {0, 1} defined by the require- 
ment that F(i, n, u, S) = 1 if and only if (oj, bi) and (22) 
are disjoint. □ 

APPENDIX B: DIRECT PROOF OF 
COROLLARY 2 

This appendix gives a proof of Corollary 2 that 
does not use the notion of computability. It parallels 
the proof of Theorem 1. 

Inequality (8) implies that there exist aG (0,1), 
/ > I{9), and ./V such that inequality (11) holds for 
all n> N . Since a and / can be taken rational, it 
suffices to prove for fixed a, I and N that (11) holds 
for all n > N only for countably many 9. 



\T n (to)-9 2 \<cn^ 2 ; 
therefore, the triangle inequality and (12) imply 

|0 2 -0i| <2cra- 1/2 <(l + e)- 2 |# 2 -#i|, 
which is impossible. 
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